Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games
نویسندگان
چکیده
This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning. Given fundamental difficulty calculating a Nash (NE), we instead aim at finding coarse correlated (CCE), solution concept that generalizes NE by allowing possible correlations among agents’ strategies. We propose algorithm which each agent independently runs optimistic V-learning (a variant Q-learning) to explore unknown environment, while using stabilized online mirror descent (OMD) subroutine for policy updates. show agents can find $$\epsilon $$ -approximate CCE most $$\widetilde{O}( H^6S A /\epsilon ^2)$$ episodes, where S is number states, size largest individual action space, and H length episode. appears be first sample complexity result generic games. Our results rely on novel investigation anytime high-probability regret bound OMD with dynamic rate weighted regret, would independent interest. One key feature our it decentralized, sense has access only its local information, completely oblivious presence others. way, readily scale up arbitrary agents, without suffering from exponential dependence agents.
منابع مشابه
Taking turns in general sum Markov games
This paper provides a novel approach to multi-agent coordination in general sum Markov games. Contrary to what is common in multi-agent learning, our approach does not focus on reaching a particular equilibrium between agent policies. Instead, it learns a basis set of special joint agent policies, over which it can randomize to build different solutions. The main idea is to tackle a Markov game...
متن کاملHierarchical Multiagent Reinforcement Learning in Markov Games
Interactions between intelligent agents in multiagent systems can be modeled and analyzed by using game theory. The agents select actions that maximize their utility function so that they also take into account the behavior of the other agents in the system. Each agent should therefore utilize some model of the other agents. In this paper, the focus is on the situation which has a temporal stru...
متن کاملValue-function reinforcement learning in Markov games
Markov games are a model of multiagent environments that are convenient for studying multiagent reinforcement learning. This paper describes a set of reinforcement-learning algorithms based on estimating value functions and presents convergence theorems for these algorithms. The main contribution of this paper is that it presents the convergence theorems in a way that makes it easy to reason ab...
متن کاملQL2, a simple reinforcement learning scheme for two-player zero-sum Markov games
Markov games are a framework which formalises n-agent reinforcement learning. For instance, Littman proposed the minimax-Q algorithm to model two-agent zero-sum problems. This paper proposes a new simple algorithm in this framework, QL2, and compares it to several standard algorithms (Q-learning, Minimax and minimax-Q). Experiments show that QL2 converges to optimal mixed policies, as minimax-Q...
متن کاملMultiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games
Reinforcement learning turned out a technique that allowed robots to ride a bicycle, computers to play backgammon on the level of human world masters and solve such complicated tasks of high dimensionality as elevator dispatching. Can it come to rescue in the next generation of challenging problems like playing football or bidding on virtual markets? Reinforcement learning that provides a way o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Dynamic Games and Applications
سال: 2022
ISSN: ['2153-0793', '2153-0785']
DOI: https://doi.org/10.1007/s13235-021-00420-0